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Abstract In [6] the authors give an algorithm for answering conjunctive 
queries over ACCN1Z knowledge bases which is coNP in data complex- 
ity. Their technique is based on the tableau technique for checking sat- 
isfiability in ACCN1Z presented in [2]. In their algorithm, the blocking 
conditions of [2] are weakened in such a way that the set of models their 
algorithm yields suffices to check query entailment. The algorithm we 
propose consists on applying a similar technique to the tableaux algo- 
rithm in [4], which decides the satisfiability of SH1Q knowledge bases. 
As a result we have an algorithm for answering conjunctive queries over 
STilQ knowledge bases that is also coNP in terms of data complexity. 

1 Introduction 

The idea of using description logic (DL) knowledge bases to represent the con- 
ceptual view of data repositories is becoming popular nowadays. In the context of 
large data repositories with a fixed schema, query answering becomes a key issue 
and the size of the data is the main parameter for measuring complexity. While 
atomic queries (A-Box reasoning) have always been considered an essential rea- 
soning task in description logics, conjunctive queries and other kind of queries 
have recently become a topic of interest. Data complexity of query answering 
over DL knowledge bases was already studied in [7] . Many of the existing results 
correspond to the fragment of DLs for which the problem remains polynomial 
and the LogSPACE boundary of such logics, that has been studied in detail 
in [3]. It is known that for rather simple DLs, even less expressive than ACS , 
the problem is already coNP hard [7,3]. However, results concerning complexity 
upper bounds are scarce. In [6] the authors prove that answering conjunctive 
queries over ACCNH knowledge bases is in coNP w.r.t. data complexity and 
they provide a worst case optimal algorithm for solving the problem. In this 
work, we address the same problem for more expressive DLs, namely ones that 
have inverse roles and role hierarchies. In [5], a data complexity coNP upper 
bound for ground atomic queries over SHXQ knowledge bases is given, but their 
technique does not yield such an upper bound for conjunctive queries. 

In this work we use a tableau algorithm. The algorithm proposed in [6] is 
based on the tableau technique for checking satisfiability in ACCAflZ presented 



in [2] . The key issue is that the blocking conditions of [2] are weakened in such a 
way that it can be ensured that the query is entailed by the knowledge base iff it 
is entailed by the models obtained via this algorithm. The algorithm we propose 
consists basically on applying the same technique to the tableaux algorithm 
in [4], which decides the satisfiability of a STLIQ knowledge base. As a result 
we have an algorithm for answering conjunctive queries over STLIQ knowledge 
bases that is CoNP in data complexity. 

2 Preliminaries 

2.1 STiTQ Knowledge Bases 

The syntax and semantics of STLXQ are defined in the standard way. 

Definition 1 {STLIQ knowledge base). Let C be a set of concept names and 
R a set of role names with a subset R + C R of transitive role names. The set 
of roles is RU {R~ | R G R}. The function Inv and Trans are defined on roles. 
Inv is defined as \r\v(R) = R~ and lnv(i? _ ) = R for any role name R. Trans is 
a boolean function, Trans(i?) = true iff R G R + or lnv(i?) G R+. 

A role inclusion axiom is an expression of the form RQ S where R and S are 
roles. A role hierarchy is a set of role inclusion axioms. The relation C* denotes 
the transitive closure of C over a role hierarchy TZLi {\r\v(R) C Inv(S') | R C S G 
1Z}. We say that R is a sub-role of S when R C* S, and a super-role of S when 
S C* R. We will assume that it is never the case that R is both a sub-role and 
a super-role of S 1 . A role is simple if its neither transitive nor has transitive 
sub-roles. 

The set of STLTQ concepts is the smallest set such that: 

— Every concept name is a concept, 

— If C and D are concepts, R is a role, S is a simple role and n is a non- 
negative integer, then C l~l D, C U D, ->C, VR.C, 3R.C, > nS.C , < nS.C 
are concepts. 

A concept inclusion axiom is an expression of the form C C D for two 
concepts C and D. A terminology or T-Box is a set of concept inclusion axioms. 

Let 1 be a set of individual names. An assertion is an expression that can 
have the form C(a), R(a,b) or a zfc b where C is a concept, R is a role and 
a, b G I. An A-Box is a set of assertions. 

A STLTQ knowledge base is a triple K = (A,1Z,T), where A is an A-Box, 
TZ is role hierarchy and T is a terminology. 

The semantics of STLTQ knowledge bases is given by interpretations. 

1 This consideration is done for practical purposes, however it does not restrict the 
expressiveness of the language. It is clear that if R is at the same time a sub-role 
and a super-role of S both roles will have the same extension and one of them can 
be eliminated by replacing it by the other. 
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Definition 2 (Interpretation). An interpretation X = (A x , x ) is defined for 
a set of individual names I, a set of concepts C and a set of roles R. The set 
A x is called domain of 1. The valuation x maps each individual name in I to 
an element in A x , each concept in C to a subset of A 1 , and each role in R to 
a subset of A 1 x A 1 . Additionally, for any concepts C, D, any role R and any 
non-negative integer n, the valuation x must satisfy the following equations: 



R x = 


(R x ) + 


for each role R € R+ 


(R-f = 


{Ml 


{x,y)eR x } 


(C n D) 1 = 


c x nc x 




(CUD) X = 


c x uc x 




hC) x = 


A X \C X 




{VR.C) X = 


{x | for 


all y, (x,y) <G R x implies y G C x } 
some y, (x, y) G R x and y G C x } 


(3R.C) X = 


{x | for 


(> nR.C f = 


1 \{y 1 


(x, y) G R x and y G C x }\ > n} 


(< nR.C) x = 


1 \{y 1 


(x, y) G R x and y G C x }\ < n} 



Definition 3 (Model of a knowledge base). An interpretation X satisfies 
an assertion A iff: 



An interpretation X satisfies an A-Box A if it satisfies every assertion in A. 
I satisfies a role hierarchy 1Z if R x C S x for every R C S in 1Z. X satisfies a 
terminology T if C x C D x for every C C D inX . X is a model of K = (A, 1Z, T) 
if it satisfies A, 1Z and T . 

A SJiXQ concept is said to be in negation normal form (NNF) if negation 
occurs only in front of concept names. Since concepts can be translated into 
NNF in linear time [4], we will assume that all concepts are in NNF. We denote 
by NNF(^C) the NNF of the concept ->C. The closure of a concept clos(C) is 
the smallest set containing C that is closed under subconcepts and negation (in 
NNF). For a knowledge base K, c\os(K) = U C ( a)eK clos(C). 

Global Constraint Concepts . 

A knowledge base K has an associated set of concepts that we will call the 
global constraint concepts of K. This set contains two kinds of concepts: 

— For each concept inclusion axiom CCDin the TBox, there is a global con- 
straint concept of the form -^C'UD. This way, if we assure that all individuals 
in a model belong to the extension of global constraint concepts, the model 
will satisfy the T-Box of K 2 . 

2 In [4] the authors consider an internalised T-Box. We do not make this assumption. 



a G C x 
(a, b) G R x 
a x ^b x 



if A is of the form C(a) 
if A is of the form R(a, b) 
if A is of the form a •fib 
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— We will consider that K, additionally to the A-Box, T-Box, R-Box, might 
have a set of distinguished concepts names that we will denote Ck- In order 
not to make the notation too cumbersome, we will not denote it explicitly as 
a part of K. For all concept names C in Ck the concept CLHC belongs to the 
global constraints of K. In the algorithm we present in the following sections, 
we will use partial representations of models of a knowledge base to verify 
whether some formula Q is entailed in them. In these partial representations 
it may remain undecided whether some individuals belong to the extension 
of a concept or of its negation. However, for the concepts that appear in 
Q, we want to assure that the decision is taken. We will later see that in 
our framework, the set Ck will be used to represent the concepts that may 
appear in the queries to be answered 3 . 

Definition 4 (Global Constraint Concepts). Given a knowledge base K = 
(A,T,1Z) and a set of distinguished concept names Ck, the set of global con- 
straint concepts for K and Ck is defined as const(A, Ck) = {^C U D | C C D e 
T}U{CUnC I C € C K }- 

If not stated otherwise, in the following K will denote a STLXQ knowledge 
base K = (A, 1Z, T) , Hk the roles occurring in K together with their inverses, 
clos(A') the closure of the concept names occurring in A, Ck will denote a dis- 
tinguished set of concept names, and Ik the individual names occurring in A. 

2.2 Answering Conjunctive Queries over Knowledge Bases 

In the traditional database setting, free variables in a query are called distin- 
guished variables. For a query Q that has X as distinguished variables, the query 
answering problem over K consists on finding all the possible tuples of constants 
T of the same arity as X such that when X is substituted by T in Q, it holds 
that K \= Q. The set of such tuples T is the answer of the query. Query answer- 
ing has an associated recognition problem: given a tuple T, the problem is to 
verify whether T belongs to the answer of Q . We say that query answering for 
a certain description logic is in a class C w.r.t. data complexity when the corre- 
sponding recognition problem is in C. Since we will only focus on the recognition 
problem, we allow conjunctive queries to contain constants and we are assuming 
that all variables in the query are existentially quantified. 

Definition 5 (conjunctive query). A conjunctive query over a knowledge 
base K is a sentence of the form 

(3F).pi(i^)A...Ap n (TQ 

3 If Ck = c\os(K), the algorithm can be used to check entailment w.r.t. any concept 
in the knowledge base, however this may be inconvenient from an implementation 
perspective. 

4 This problem is usually known as the query output problem. 
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where pi,...,p n are either roles in or concepts in Ck; Yi, . . . ,Y n are tuples 
of variables and constants. Vq = Y\ U . . . U Y n denotes the set of variables and 
constants in Q. The set of literals in Q is Lq = {p\(Y\), . . . ,p n (Y n )}, and the 
cardinality of Lq will be denoted by uq . 

Conjunctive queries arc interpreted in the standard way, i.e. X = (A 1 , • x ) is 
a model of Q if there is a mapping a from the variables and constants in Q to 
objects in A x such that a is the identity on all constants and cr(Y) e p x for all 
p(Y) G Lq. For a knowledge base K and a query Q, we say that K \= Q iff for 
every interpretation X, X \= K implies X (= Q. Analogously, for a completion 
forest T and a query Q, we say that T |= Q iff for every interpretation X, X \= T 
implies X |= Q. 

Definition 6 (Conjunctive Query Entailment). Let K be a knowledge base 
and let Q be conjunctive query. The conjunctive query entailment problem is to 
decide whether K \= Q. 

We are interested in solving the conjunctive query entailment problem. It is 
important to notice that the conjunctive query entailment problem is not re- 
ducible to satisfiability of knowledge bases, since the negation of the query can 
not be expressed as a part of a knowledge base. For this reason, the known algo- 
rithms for reasoning over knowledge bases do not suffice. A knowledge base K 
has an infinite number possibly infinite models, and we have to verify whether 
the query Q is entailed by all of them. In general, wc want to provide an en- 
tailment algorithm, i.e. an algorithm for checking whether a sentence Q with a 
particular syntax (namely, a conjunctive query) is entailed by a SHXQ knowl- 
edge base K. Informally, our algorithm differs from the one proposed in [4] for 
reasoning with individuals in STLXQ in the fact that, since they only focus on 
problems that can be reduced to checking satisfiability, they only need to ensure 
that if the knowledge base has some model then their algorithm will obtain a 
model. In our case, however, this is not enough. We need to make sure that the 
algorithm obtains a set of models M such that Q is entailed by K iff it is entailed 
by every model in M. 

3 A SHXQ Entailment Algorithm 

We will provide an algorithm for checking entailment of some sentence Q in 
a SHXQ knowledge base K, i.e. to check if all models of K are models of Q. 
Like the algorithm in [4], we will use completion forests. A completion forest 
is a relational structure that captures sets of models of a knowledge base. A 
completion forest is always finite, and it represents a set of possibly infinite 
models. When defining completion forests, we will use a parameter n that is not 
present in [4] . This parameter will be crucial in ensuring that the application of 
our algorithm will yield a set of models M such that Q is entailed by K iff it 
is entailed by every model in M. We will see later that this parameter will take 
values that depend on Q. 
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3.1 Completion Forests 

A forest will be defined as a set of variable trees. A variable tree is a tree where 
the nodes are variables, and where the nodes and arcs of the tree are labeled. 
For any nodes n\ and n 2 , C{n-\) will denote the label of n± and £((ni, n 2 )) will 
denote the label of the arc that goes from m to n 2 . 

Definition 7 (n-tree equivalence). Given a variable tree V s.t. v is a node 
of V, the n-tree of v is the subtree of V that has v as its root and contains 
the successors of v that are at most n direct successor arcs away. We denote by 
V n (v) the set of nodes ofV that appear in the n-tree of v. Two variables v, w in 
V are n-tree equivalent in V if there is an isomorphism ip between their n-trees, 
i.e. tp : V n (v) — ► V n (w) is a mapping such that: 

— il>{v) — w 

— for every node n in V n (v), C{n) = C{ip(n)) 

— for every arc connecting two nodes n\ and n 2 in V n (v), 
£{(ni,n 2 )) = £((V(m),V>( n 2))). 

Definition 8 (n- Witness). Let V be a variable tree where both v and w are 
nodes. We say that w is an n-witness of v in V iff w is an ancestor of v in V, 
w is n-tree equivalent tovinV and v is not in the n-tree of w. Let t denote the 
n-tree of which v is root, t' the n-tree that has w as root, and let ip denote an 
isomorphism between t and t' . In this case, we say that t' tree-blocks t. For all 
variables x in t, we say that ip(x) tree-blocks x. 

Definition 9 (n-Completion Forest). 

A completion forest for a knowledge base K is given by a forest of trees and 
an inequality relation 56 which is assumed to be symmetric. The forest is a set of 
variable trees whose roots are the individuals in Ia- The roots can be connected 
by edges in an arbitrary way. L(x) C c\os(K ) denotes the label of a node x, and 
C({x,y)) C H K denotes the label of an edge {x,y). 

If two nodes x, y are connected by an edge with R G C((x,y)) and R C* S 
then y is an ^-successor of x and y is an lnv(S)-predecessor of x. If y is an 
S-successor of x, then y is an 5*-descendant of x. If z is an S-descendant of x, 
y is an S-descendant of z and S G R+, then y is an ^-descendant of x. If x is 
an S-successor or an Inv(S') -predecessor of y, then x is an ^-neighbor of y. If 
x is an S-successor of y for some role S, then x is a successor of y and y is a 
predecessor of x. The transitive closure of predecessor is called ancestor. 

A node is blocked iff it is not a root node and it is either directly or indirectly 
blocked. A node is indirectly blocked iff one of its ancestors is blocked or if it's 
a successor of a node x and C(x, y) = 0. A node is directly blocked iff none of 
its ancestors are blocked and it is a leaf of an n-tree that is tree-blocked. 

Definition 10 (Clash free completion forest). A node x in a completion 
forest T contains a clash iff for some concept name C , C G C{x) and ->C G C{x) 
or if < n R.C. G C(x) and x has n+1 R-successors yo, ■ ■ ■ , y n such that C G C(yi) 
for all yi and yi ^ j/j G T for all < i < j < n. A completion forest T is clash 
free if none of its nodes contains a clash in T . 
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Definition 11 (Complete completion forest). A completion T is complete 
if none of the rules in Table 1 can be applied to it. 

3.2 The Completion Forest Algorithm 

Given a knowledge base K = (A, 1Z, T) and a blocking parameter n, the algo- 
rithm does the following: An initial completion forest for K is built and it is 
expanded using the rules in Table 1 until no more expansions can be obtained. 
The (possibly empty) set of complete and clash-free n-completion forests ob- 
tained by this expansion induce a set of models for K. As we will see in the 
coming sections, this set of models can be used to check entailment of a con- 
junctive query Q if a suitable n (depending on Q) is used. 

Initializing the Completion Forest. An initial completion forest Tk for a 
knowledge base K is constructed as follows: 

— For each individual Oj G Ik & node aj is introduced. 

— An edge (a i7 aj) is created iff R(ai,aj) G A for some role R. 

— The labels of these nodes and edges as well as the 56 relation are initialized 
as follows: 

C(ai) :={C\C{ai)e A} Uconst(K,C K ) 
£({ai,aj}) := {R \ R(ai,aj) G A} 
a, t ^ a,j iff a>i =/= aj E A 

Expanding the Completion Forests. From the initial completion forest, new 
completion forests for K can be obtained by applying the rules in Table 1 . Note 
that the application of the rules is non-deterministic. Different choices for E 
in the U-rule and the c/ioose-rulc generate different forests. The 3-rule and the 
>-rule are called generating rules since they add new nodes to the forest. 

The set of n-completion forests for a knowledge base K is denoted by F^- 
and it is the smallest set satisfying the following conditions: 

1. The initial completion forest Tk is a completion forest for K . 

2. If J 7 is a legal n-completion forest for K and T' can be obtained from T 
by applying one of the rules in Table 1 using n-blocking, then T' is a n- 
completion forest for K. 

Completion Forests as Semantic Objects. Semantically, we can interpret 
a completion forest in the way we interpret a knowledge base. For a knowledge 
base K and a completion forest T for K, note that all the individuals in Ik 
are nodes in J 7 , node labels in T are concepts in clos(AT) U Ck and edge labels 
in T are roles in R^, hence interpretations for K can be interpretations for J- 
and vice-versa. We will see completion forests as a representation of a set of 
models of the knowledge base. It is not a common practice to give a semantical 
interpretation to completion forests. However, this reading will make easier some 
of our results and proofs. 
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n-rule: 


if Ci n C2 G C(x), x is not indirectly blocked 

and {Ci,C 2 } i C(x) 
then £(x) — £(x) U Id Ci\ 


U-rule: 


if Ci U C2 G £(a:), a; is not indirectly blocked 

and {Ci,C 2 }n£(a;) = 
then £(x) := £(z) U {E} for some _E € {Ci, C2} 


3-rulci 


if ^7 r 1 ti 7* ic; not ViIopVpH finrl 

x has no S-neighbour 1/ with C £ £(y) 
then create new node y with C({x, y}) := {S 1 } 
and £(x) := {C} U const(A', C K ) 


V-rule: 


if VS.C G £(x), x is not indirectly blocked and 
there is an S- neighbour y of x with C ^ £(y) 
then £(y) := £(y) U {C} 


V+-rule: 


if VS.C G £(a;), a; is not indirectly blocked, 

there is some R with Trans(i?) and RIZ* S and 
there is an S- neighbour y of x with MR.C £ £(y) 

then £(y) := £(y) U {V72.C*} 


choose-rule: if < nS.C G C(x) or > nS.C G C(x), 




a; is not indirectly blocked and 




tncrc is an o-ncignDour y or x wren ^o,iViv_r^^ojj'i 1 JL-yy ) — y 
then C{y) := C{y) U {E} for some E G {C, NNF(^C)} 


>-rule: 


if > nS.C G £(a;), a; is not blocked and 

there are not ^-neighbours yi, . . . , y n of x such that C ^ £(y0 

and yi 56 y., for 1 < i < j < n 
then create new nodes yi, . . . , y n with C{{x, y;)) := {S}, 

C(yi) := {C} U const(A', Ck ) and y 4 96 yj for 1 < i < j < n 


<-rule: 


if <nS.C€£(x), 

x is not indirectly blocked, 

\{y | y is an 5- neighbour of x and C G £(y)}| > and 
there are S-neighbours y, z of x with not y 96 2, 
1/ is neither a root node nor an ancestor of 2 
and C G £(y) n £(2) 
then £(2) := £(2) U£(y), 
if 2 is an ancestor of x, 

then £((2, a;)) := C{{z,x)) U lnv(£((a;, y))), 

else C({x, z>) := £((a;, 2)) U C({x, y}), 
C({x,y)) := 0, 

set u 9^ 2 for all -u with w 96 y 


< r -rule: 


if < nS.C G £(x), 

|{y y is an S- neighbour of x and C G £(y)}| > n and 
there are S'-neighbours y, 2 of a; with not y 76 2, 
both y and 2 are root nodes and C G £(y) H £(2) 

then £(2) := £(2) U £(y), £(y) := 
for all edges (y, w) 

£((2, W )) := £«2, W » U£((y, W », £«y,»» := 
for all edges {w, y) 

6£({™,2» :=£({ W ,2))u£({ W ,y», C({w,y)) := 
set y « 2 and u 96 2 for all u with m 56 y 



Tablel. Expansion Rules 
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Definition 12 (Model of a completion forest). For an n- completion forest 
T for K, T G ¥ K , an interpretation 1 = (A x , • x ) is a model of T, represented 
X\= T ifl\=K and for all nodes x,y G T the following hold: 

- ifCE C(x), then x x G C x 

- ifR G C((x,y)) then (x 1 ,y x ) G R x 

- if ' x^fi y G T ' , then x 1 ^ y x 

We want to emphazise that in order to be a model of a completion forest 
for K, an interpretation must be a model of K . The initial completion forest is 
just an alternative representation of the knowledge base, and it has exactly the 
same models. When we expand the forest, we will make choices and obtain new 
forests that capture a subset of the models of the knowledge base. Note that 
if an interpretation I = (A x , x ) is a model of T, then all nodes in T will be 
mapped to an object in A x , however there might be objects in A x that are not 
the image of any node in T . 

Lemma 1. An interpretation T is a model of Tk iffl is a model of K. 

Proof. The if direction follows from Definition 12. To prove the other direction, 
it suffices to consider an arbitrary model X of K and verify that for for all nodes 
x, y G Tk the following hold: 

(i) if C G C(x), then x x G C x 
(ii) if R G C({x,y}) then (x 1 ^ 1 ) G R x 
(Hi) if x ?6 y G T, then x x ^ y x 

By definition, the nodes in Tk correspond exactly to the individuals in Ik- 
For each of these individuals a^, the label of a, in Tk is given as C(at) = 
{C I C{ai) G ^4} U const(K,C K )- Since J is a model of A, if C(ai) G A then 
af G C x . For any concept C G const(_ft', C^), either C is of the form -<D U E 
for some D C £ in T or C is of the form D U -i£) for an arbitrary concept 
D. In the first case, of G (->D U i?) 1 must hold because I is a model of T. 
In the other case, x x G (D U -^D) x holds for any individual a; in Z\ x and any 
concept D by the definition of interpretation. So we have that of G C x for every 
C G and item (i) holds. The label of a pair of nodes a*, aj in is given 

by £((a,i,aj)) = {R \ R(ai,aj) G A}. Since X is a model of A, (a x a x ) G R x 
for every R(ai,aj) in .4, hence item ^ holds. Analogously, the 76 relation was 
initialized with <n ^ a 3 for every a, 76 aj in ^4., so item (zwj will also hold for 
any I model of A. 

Finally, for a set of completion forests F, we will denote by ccf(F) the set 
of forests in F that are complete and clash free. For a knowledge base K, the 
union of all the models of the forests in ccf(F^) captures all the models of K, 
as we prove in Proposition 1. This result is crucial, since it allows us to ensure 
that checking the forests in ccf(Fj^) suffices to check all models of K. In order 
to prove this result, we will first prove the following lemma. It states that when 
applying any of the rules in Table 1, no models are lost. 
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Lemma 2. Let J 7 be a completion forests in W^, let r be a rule in Table 1 and 
let F be the set of n- completion forests that can be obtained from T by applying 
r. Then for every T such that I \= T there is some T' G F such that T |= T' . 

Proof. We will do the proof for each rule r in Table 1 . 

First we will consider the deterministic, non-generating rules. There is only 
one T' in F and the models of T are exactly the models of J 7 '. For the case of 
the n-rule, there is some node x in J 7 s.t. C\ l~l C2 G C(x). Since T is a model of 
J 7 , then x 1 G (C\ n C2) 1 , and since T is a model of K, then both x 1 G Cf and 
x 1 G C x hold. The inequality relation and all labels in J 71 are exactly as in J 7 , 
the only change is that {C\, C2} C C(x) in J 7 ', so X |= J 7 ' . 

The cases of the the V-rule and the V+-rule, are similar to the n-rule. All 
labels of J 7 are preserved in J 7 '. Only the label of the node y to which the rule 
was applied is modified, having in J 7 ' C C C(y) or VS.C C C(x) respectively. 
Since 1 is a model of K, x 1 G (VS , .C) X and y and S*-neighbour of x imply 
yi g an j x x g (V.C) 1 and y and i?-neighbour of x for some transitive 
sub-role of S imply y 1 G (V.C) 2 ', then trivially I \= J 7 ' in both cases. 

Let us analyze the non-deterministic rules. For the case of the U-rule, there 
is some node x in J 7 s.t. C\ U C 2 G After applying the U-rule, we will have 

two forests J 7 [, J 7 ^ with {Ci} C C{x) in JT( and {C2} C £(x) in T' 2 respectively. 
For every T such that T is a model of J 7 we have x 1 G (Ci U C2) 1 , and since 
I is a model of X, then either x 1 G Cf or a; 1 G C x hold. If it is the case that 
x 1 G Cf, then 2~ |= J 7 [, and otherwise X |= J 7 ^, so the claim holds. 

The proof of the choose rule is trivial, since after its application we will have 
two forests J 7 [, J 7 ^ with {C} C C{x) in T' x and {^C} C C{x) in T'i respectively, 
but since trivially x 1 G (C U ^C) x holds for any x, any C and any X model of 
K, then for every X either T |= JFJ or Z |= X"^ holds. 

When the <-rule or the < r -rule arc applied to a variable x in J 7 , there are 
some variables y, z neighbours of x s.t. y is identified with z in J 7 '. This can 
only be done if we do not have that z 1 ^ y 1 in 2", hence it must be the case that 
z x = y x ■ In J 7 ', we will add the pair (z, y) to the extension of «. Due to z 1 = y 1 
the extensions of all labels of T will be preserved in J 7 ' and so I |= J 7 ' holds. 

Finally we consider the two generating rules. For the case of the 3-rule, since 
the propagation rule was applied, there is some x in J 7 such that 3R.C G C{x), 
which implies the existence of some o G A 1 with (x x ,o) G R x and o G C x . J 7 ' 
was obtained by adding to J 7 a new node which we denote y. This node will 
make explicit in J 7 the existence of o, and we will have that y x = o, so X \= J 7 ' . 

The case of the >-rule is analogous to the 3-rule, since in models of T' we 
have that yf = Oi for 1 < i < n, where {yi, . . . , y n } are the variables added to 
J 7 and 01, . . . , o n denote the elements in A x s.t. (x x , Oi) G R x and oi G C x for 
the variable x in to which the rule was applied. 

Finally, we can prove that the union of models of the forests in ccf(F^-) is 
exactly the set of all models of K . 

Proposition 1. For every X such that X \= K, there is some T G ccf (F^-) with 
n > such that X (= J 7 . 
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Proof. From Lemmas 1 and 2, we have that for every X such that I \= K, there 
is some T £ ¥ K with n > such that I is a model of T . Now we want to 
prove that there is some T c £ ccf(F^-) such that X \= T c . Suppose there is an 
interpretation X such that X is a model of some completion forest T that is not 
complete. Then either it is possible to obtain a new forest T' such that X |= J 7 ' , 
or none of the propagation rules can be applied. The latest would imply that 
either T was complete, which is a contradiction, or that T had a clash, which is 
also a contradiction since F has a model. Hence, while applying the propagation 
rules, the model will be preserved until some complete forest T c is reached. 

3.3 Tableaux and Canonical Models 

We will define a tableau for a knowledge base. A tableau is only a representation 
of a model of a knowledge base, however, if may be infinite. Intuitively, a tableau 
is a model captured by a complete and clash free completion forest T and it will 
provide a natural way of building a canonical interpretation of T. Note that 
if T contains blocked nodes, then it is capturing a set of potentially infinite 
models. In this case, its tableau must be an infinite structure. The tableau T of 
a forest T will correspond to the unraveling of T. i.e. the structure obtained by 
considering each path to a node in T as a node of T. Following [4] , we will give 
a rather complex definition of a tableau. Defining a model of K from a tableau 
will be straightforward with this definition, and the many conditions required 
for a tableau are met by complete and crash free completion forests. 

Definition 13 (Tableau). T = (S, £,£,!) is a tableau for a knowledge base 
K = (A,TZ,T)iff 

— S is a non-empty set, 

— £ : S — > 2 clos ^' maps each element in S to a set of concepts, 

— £ : Rk — > 2 SxS maps each role to a set of pairs of elements in S, and 

— I : Ik — * S maps each individual occurring in A to an element in S. 

Furthermore, for all s,t £ S; C, C\, C2 £ clos(.K') and R,S£ Rk, T satisfies: 

(PI) ifC £ C(s), then f C(s), 
(P2) ifd nC 2 £ C{s), then d £ £(s) and C 2 £ C{s), 
(PS) ifd Ud£ C{s), then d £ £(s) or d £ C(s), 
(P4) ifVS.C £ £(s) and (s,t) £ £(S), then C £ £(t), 

(P5) if 3S.C £ £(s), then there is some t £ S such that (s,t) £ £(S) and 
C £ £(t), 

(P6) ifVS.C £ £{s) and (s,t) £ £{R) for some R C* S with Trans(i?) = true 

then VS.C £ £{t), 
(PI) (s,t)££(R) iff{t,s)e£(\m(R)), 
(P8) if (s,t) £ £(R) and R\Z* S then (s,t) £ £(S), 
(P9) if<nS.C £ £{s), then \{t £ S | (s,t) £ £(S) and C £ £{t)}\ < n, 
(P10) if>nS.C £ £{s), then \{t£S \ \s,t) £ £{S) and C £ £(t)}\ > n, 
(Pll) if (s,t) £ £(R) and either < nS.C £ £{s) or > nS.C £ £{s), then 
C £ £{t) or NNF(^C) £ £{t), 
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(P12) ifC(a) £ A then C £ C(l(a)), 

(PIS) ifR(a,b) £ A then (1(a), 1(a)) £ £(R), 

(PI 4) ifa^b£A then 1(a) ^ 1(a), 

(P15) ifC£ const(K,C), then for all s £ S C £ C(s). 

Trivially, we can obtain a canonical model of a knowledge base from a tableau 
for it. 

Definition 14 (Canonical Model of a Tableau). Let T be a tableau. The 
canonical model of T, It = (A 1 " 1 ' , ■ It ) is defined as follows: 

A It := S 

for all concept names A in c\os(K), 

A Xt ~{s\A£ C(s)} 

for all individual names a in Ik, 

a x := a 

for all role names R in 1Z, 

R It := £(R)® 

where £(R)® the closure of the extension of R under 1Z, which is defined as: 

cvm© ._/ (£(R)) + ifTrans(i?) 
1 > • \ £(R)lisub(£(R)®) otherwise 

where (£(R)) + denotes the transitive closure of £(R) and 
sub(£(Rf) = \J £(Pf. 

Lemma 3. Let T be a tableau for K. The canonical model of T is a model of 
K. 

Proof. That It is a model of 1Z and A can be proved exactly as in the proof of 
Lemma 2 in [4]. Due to (P15), it can be easily verified that It is also a model 
of T. 

Canonical Interpretation of a Completion Forest. A completion forest T 
induces a tableau Tp, and this tableau gives us a canonical model for T . 

Definition 15 (Tableau induced by a completion forest). A path in a 
completion forest J- is a sequence of nodes of the form p — . . . , ^j-] . In such 

a path, we define tail(p) = x n and tail'(p) = x' n ; and [p \ x x ? +1 ] denotes the path 

[|f , . . . , ^ ±1 ] ■ For any path p and variable z, if z is not blocked and z is an 

R-successor ofta\\(p), then [p \ |] is an i?-step of p. If z' is blocked by z and z' 
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is an R-successor o/tail(p), then [p | p-] is an R-step of p. If q is an R-step of 
p for some role R, then q is a step of p and p is a prefix of q. The transitive 
closure of prefix is called subpath. 

Given a completion forest T , the set paths(jF) is defined inductively as fol- 
lows: 

— Ifxi is a root in T, [4] G paths(jF). 

Xq 

— IfpG paths(^ r ) and q is a step of p, then q 6 paths(^ r ). 

The tableau Tjr = (S,£, £,I) induced by the completion forest T is defined 
as follows: 

S = paths(jF) \ {p | p £ paths(JT) and p = [|] for some x with C{x) = 0} 
C(p) = £(tail(p)) 

£{R) — {{p, q) G S x S I q is an R-step of p}U 

{{P,l) G S x S | p is an \nv(R)-step of g}U 

{([§], [|]} G S x S | x, y are root nodes and x is an R-neighbour of y} 

Lemma 4. Every T <G ccf (F^-) for n> 1 induces a canonical model Tjr for K. 

Proof. First, it is proved as in [4] that every T € ccf(FJ-) for n > 1 induces 
a tableau Tjr for K. For the last item of the proof of (P9), note that since 
n > 1, pairwise blocking is subsumed and the existence the u predecessor can 
be ensured. (P15) also holds due to the following facts: 

— All nodes x are initialized with const(K, Ck) (= C{x). 

— The concept names in const(K, Ck) are never removed from the label of a 
node unless the label is set to by the < r -rule. In this case, the label of the 
node is never modified again. 

Since is a tableau for K, it has a canonical model that is a model of K. 
The canonical model of T is . 

4 Answering Conjunctive Queries 

For a knowledge base K and a query Q, we say that K |= Q iff for every 
interpretation 1, 1 \= K implies I \= Q. Analogously, for a completion forest T 
and a query Q, we say that T \= Q iff for every interpretation T, T \= T implies 
T |= Q. We are interested in solving the conjunctive query entailment problem. 
However, a knowledge base K has an infinite number of possibly infinite models. 
The problem is then how to verify that the query Q is entailed by all of them. 
The key issue is that for a given Q it is sufficient to consider the set of complete 
and clash free TV-completion forests for K, where N is a number that depends 
on Q. Then, we only have to verify a finite number of structures, all of them 
of finite size. In order to provide a sound and complete algorithm for answering 
conjunctive queries, we have to prove the following: 
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1. If K \= Q then for every T G ccf(F^) we can find a mapping from the 
variables in Q to the variables in T that witnesses the entailment of the 
query. 

II. If K does not entail Q, then there will be some T G ccf (F^) into which Q 
can not be mapped. 

From I and II, we have an algorithm for checking conjunctive query en- 
tailment that works as follows: an initial completion forest for K is built and 
expanded using a suitable ./V-blocking as termination condition. Then Q is en- 
tailed by K iff the query can be mapped into every complete and clash free 
completion forest obtained. 

In the following , we will use Q to denote a conjunctive query. We say that Q 
can be mapped into a completion forest T, denoted \=jr Q, if there is a mapping 
a : Vq — > Vjr that is the identity mapping for all constants in Vq and that 
satisfies the following: 

1. For all C{x) G Lq, C G C{a{x)). 

2. For all R(x,y) G Lq, a(y) is an i?-desccndant of a(x). 

We have already proved that every model of K is a model of some T G 
ccf(F^). Hence, if K ¥ Q, then T ¥ Q for some T . To prove II, we only need 
to prove that if this is the case, then there is no mapping a. This is done in the 
next lemma, which stated that the existence of a suffices to ensure that I \= Q 
for every I model of T . 

Lemma 5. If \=jr Q, then T |= Q. 

Proof. Since Q, there is a mapping a : Vq — > V? satisfying conditions 1 
and 2. Take any arbitrary model X = (A 1 , - 7 ) of T . By definition, it satisfies 
the following: 

— if C G C(x), then x 1 G C 1 

— if x is an i?-descendant of y, then (x 7 , y 1 ) G R 1 . 

— if x 56 y G J 7 , then x 1 ^ y 1 

We can define a mapping (f> from the variables in Vq to objects in A 1 as 
<p(x) = cr(x) 1 , and this mapping satisfies <j>(Y) G p 1 for all G Lq. 

The next step is to prove I. We know that if K \= Q, then I |= Q for any 
model I of any T G ccf(F^). We only need to ensure that if this is the case, 
then the mapping a can be found in T, i.e. we want to consider a suitable N 
such that the set of complete and clash-free TV-completion forests can witness on 
their own the entailment of the query. It suffices to prove that if there is model 
of T that is a model of Q, then Q can be mapped into T . In particular, we will 
see that if the canonical model of a forest entails Q, then a mapping of Q into 
T exists. 

In this proof, the value of N (and hence the termination condition) will play 
a crucial role. As we mentioned, it depends on Q. More specifically, it depends 
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in what we call maximal Q-distance. If the canonical model of a forest T entails 
Q, then there is a mapping of the variables in Q onto the nodes of the tableau 
induced by T . Intuitively, the maximal Q-distance is the length of the longest 
path between two connected nodes of the graph defined by the image of the 
query when mapped on the tableau. For a maximal Q-distance of d it will be 
possible to find a mapping in an (^-completion forest that is isomorphic to the 
image of the query under a, since this image does not contain any path of length 
greater than d. For this reason, we will use the maximal Q-distance as blocking 
condition when expanding the completion forest. 

Formally, for a given forest T in ccf(F^) for some n, let 7> = (S,C,£,T) 
denote its tableau and Tyr the canonical interpretation of iy. If Xjr |= Q, then 
there is a mapping u : Vq — ► S such that for every R(x, y) E Lq, (<t(x), <r{y)} € 
£ (R)® ■ For each such R(x, y) E Lq, we use d R (a(x),a(y)) to denote the length of 
the shortest path from a(x) to a(y) in the graph (S, {J pn , R £{P)) and call it the 
i?-distance between cr(x) and o~(y). For any x, y in Vq, d®(x,y) is the maximal 
d R (a(x),a(y)) that is defined for all R (and it is if it is not defined for any 
R). Let p be a path in the graph G(Q) = (Vq, {(x, y) \ R(x, y) E Lq, R E Rk}), 
then dQ(p) = j2( x . y)ep d Q {x,y), and 

maxd9(x, y) = max{d® \p) \ p is a path from x to y in G(Q)} 

Finally, the maximal Q-distance, denoted dQ, is the maximal maxd^(x,y) that 
is defined for all x, y in Vq, and it is zero if it is not defined for all x, y. The 
maximal Q-distance is bounded by the length of the longest path in G(Q) (which 
is bounded by ng) times the maximal d®(x, y) that is defined for all x, y in Vq. 

Now we prove that for any complete and crash free rig-completion forest T , 
if Ijf \= Q, then there is a mapping <r' : Vq — > T that witnesses the entailment 
of Q. 

Proposition 2. Consider any T e ccf(F^), and let Tjr be the canonical model 
of the tableau induced by T . Iflp [= Q then \=j? Q. 

Proof. Since Xjr |= Q, then there is a mapping u : Vq — ► A I:F s.t. 

- For all C{x) E L Q , a(x) E C 1 ^ . 

- For all R{x,y) E L Q , (a(x),a(y)) E R 1 ? . 

Since A I:F = Vr^, er(x) and a(y) are nodes in Tjr and correspond to paths in 
T '. By the definition of Xjr, the mapping a satisfies that for all C(x) E Lq, 
C E C{<t{x)) and for all R(x,y) E Lq, (a(x),a(y)) E £(R) (B . 

We will define a new mapping a' : Vq — > Vjr. In order to define a', we will 
first consider the pairs of variables that are mapped by a to nodes in the forest 
such that the path connecting them goes through a leaf of a blocked tree. The set 
of this pairs will be denoted th rough Leaves(Vg). For each R(x, y) E Lq, if there is 
some seS s.t. (a(x),s) E £(R) B , (s,a(y)) E £(R) @ and tail(s) ^ tail'(s), then 
(x,y) E th rough Leaves(Vg). The set afterblocked(Vg) will contain the variables 
in Vq that occur in the second position of some pair in throughLeaves(Vg) or that 
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are mapped to a descendant of one such node. If (x, y) G th rough Leaves(Vg) or if 
R(x,y) G Lq and x G afterblocked(Vg), then y G afterblocked(Vg). For all vari- 
ables v in Vq \ afterblocked(Vg), if tail'(er(u)) is tree blocked let tp(ta\\'(a(v))) = 
ta\\(a(v)) denote the variable that tree blocks it. Otherwise, let ip be the identity 
function. The mapping a' : Vq — > Vjr is defined as follows: 

,, . _ f tail'(cr(a;)) if x G afterblocked(Vg) 
1 V'(tail'(cr(x))) otherwise 

Now we will show that the mapping a' has the following properties: 

1. If C G C(<r(x)), then C(x) G Lq, C G £((ct / (x))). 

2. If (<r(x),a(y)) G £(R)®, then cr'(y) is an i?-dcsccndant of er'(x). 

The proof of 1 is trivial, since £(<r(x)) = £(ta\\'(a(x))) = £(ip{ta\\'(a(x)))), 
so C(a(x)) = £(cr'(x)). To prove 2, first we see that the following hold: 

(*) If both x and y are in afterblocked(Vg) and (a(x),a(y)) G £(R) (B then 
tail (cr(j/)) can not be a blocked leaf. 

Since x is in afterblocked(Vg), then by definition there must be some z gVq 
such that there is a path from <j{z) to <j{x) in the image of the query that 
goes through a blocked leaf node, and since there is also a path from a{x) to 
cr{y), if ta i I (cr(j/) ) was a blocked leaf then there would be a path from <r(z) 
to a(y) that goes through a blocked leaf and finishes in another blocked leaf. 
Since we used c?Q-blocking, the minimal distance between two blocked leaves 
is <1q + 1, and then the path from a(z) to <r(y) would have a length strictly 
greater than cIq, which is a contradiction. 
(**) If both x and y are not in afterblocked(yg) and (cr(x), cr(y)} G £(R)® then 
tail(er(x)) can not be a blocked leaf. 

If tail(cr(x)) is a blocked leaf and x is not in afterblocked(Vg), then (x,y) is 
in th rough Leaves(VQ) by definition, and then y is in afterblocked(Vq). 

By the definition of £(R)® and of i?-step, (a(x),a(y)) G £(i?) ffi implies that 
ta\\'(a(y)) is an i?-descendant of tail(cr(x)). We will now prove that if this is the 
case, then then a'(x) is an i?-descendant of cr'(y). Note that since a(y) is an 
i?-descendant of a(x), it can not be the case that x is in afterblocked(Vq) and y 
is not. We have the following cases: 

(a) Both x and y are in afterblocked(Fg). 

In this case we have that a'{x) = tail'(er(x)) and a'(y) = tail'(cr(y)). By (*), 
a(y) is not a blocked leaf, and then from tail(cr(j/)) = ta\\'(a(y)) we have that 
tail(a(j/)) is an i?-descendant of tail(cr(x)), so -0(tail(cr(y))) = tail'(a(j/)) is 
an i?-descendant of ifj(ta\\(a(x))) = tail'(er(x)) and <r'(y) is an i?-descendant 
of cr'(x) as desired. 

(b) Neither x nor y are in afterblocked(Vg). 

By (**), cr(x) is not a blocked leaf, so tail(cr(x)) = tail'(er(x)) an then 
tail'(cr(y)) is an i?-descendant of tail'(cr(x)), so V(tail'(cr(y))) is an i?-descendant 
of tp(ta\\'(a(x))) as desired. 
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(c) x is not in afterblocked(Vg), but y is. 

In this case we have that a (x) is a blocked leaf and tail(cr(x)) = ip(ta\\'(a(x))), 
so tail'(er(y)) = cr'(y) is an i?-descendant of tp(ta\\' (<r(x))) — cr'(x). 

Since the mapping o' has properties 1 and 2, \=p Q. 

In the absence of transitive roles, d R (a(x), cr(y)) — 1 for every pair of vari- 
ables x, y that appear in some R(x, y) in Q, and then the maximal Q-distance 
is bounded by uq. Due to this fact, it is sufficient to consider riQ-blocking as a 
termination condition when expanding the completion forest. 

Corollary 1. Let K be a knowledge base with R + = 0. Consider any T € 
ccf(F K Q ), and let Xjr be the canonical model of the tableau induced by T . If 
Xjr |= Q then ^jf Q. 

In the presence of transitive roles, if does not suffice to consider ng-blocking 
as a termination condition. Since d R (a(x), a(y)) may be arbitrarily big for each 
R(x,y), then also the maximal Q-distance is unbounded and an isomorphic 
mapping may not exist on a structure of bounded depth. However, as we will 
now show, if a there is some mapping from the query variables into a tableau for 
K satisfying Q, then there is a mapping that also satisfies Q where the maximal 
Q-distance is bound by a number that depends on K. This will allow us to find 
an isomorphic mapping of the query variables into a completion forest of fixed 
size. We denote by c the cardinality of c\os{K) U Ck and by r the cardinality of 
Rff. The bound will be given as D = 2 2c+r . We prove that any mapping where 
the maximal d R (a(x),a(y)) that is defined for some R, x, y exceeds D can be 
modified into one that does not. 

Lemma 6. Consider a tableau T = (S, £,£,!) for K. If there is a mapping 
a' : Vq — ► S that satisfies 

1. For all C(x) G Lq, C G C(a(x)). 

2. For all R(x,y) E L Q , (a(x),a(y)) E £(R)®. 

then there is a mapping a' : Vq — > S that also satisfies 1 and 2, and that 
additionally satisfies that for all R(x,y) E Lq, d R (a'(x) 7 a'(y)) < D. 

Proof. If (er(x), cr(y)) E £(R)®, then there is a sequence of nodes n , , . . . , n m 
s.t. n a = cr(x), n m = a(y) and for all < i < to, (nj,nj + i) E £(S) for some S 
subrole of R, and d R (a(x), cr(y)) = m. We can prove that if m > D, then there 
is a mapping a m with d R (o~ m (x), o~ m (y)) < to. Since there are at most 2 C node 
labels and 2 r arc labels, there are at most D = 2 2c+r possible different labellings 
for a pair of nodes and an edge. This implies that if m > D, there is some node m' 
in no, , ... , n m that had previously occurred with the same predecessor and the 
same incoming edge, and hence no, , . . . , n m contains a cycle. In this case we can 
consider the path no, , . . . , to' and the new mapping is given as er m (x) = <7 TO (x), 
and o~ m {y) = to'. Inductively, we can prove that there is a mapping a' that 
satisfies d R (a'(x) 7 a'(y)) < D for every R(x,y) E Lq. Since a' preserves all the 
labels in a and all i?-descendant relations, u' also satisfies 1 and 2. 
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Now we know that in the presence of transitive roles, since d R (a(x),a(y)) 
is bounded by D, the maximal Q-distance is bounded by Dhq, so we can use 
-DriQ-blocking as a termination condition when expanding the completion forest. 

Corollary 2. Consider any T G ccf(F^" Q ), and let Tjr be the canonical model 
of the tableau induced by T . IfTjr |= Q then \=jr Q. 

Summing up, to solve the conjunctive query entailment problem, it suffices 
to check for entailment the set of complete and crash free completion forests for 
K, no matter the n that is used as a termination condition. 

Proposition 3. K \= Q iff T |= Q for every T G ccf(F^-) for any n. 

Proof. The only if direction is trivial. Consider any T G F^-. Since any model X 
of T is a model of K by definition, then K \= Q implies T |= Q. The if direction 
can be done by contraposition. If K ¥ Q, then there is some model 1 of K such 
that TV- Q. By Proposition 1, 1 \= T for some T G ccf(F^), and we have that 
T ¥ Q for some T G ccf (F£). 

However, if we choose a suitable n-blocking, checking for entailment in all 
the models of a completion forest can be reduced to finding a mapping of the 
query into the completion forest itself. 

Theorem 1. K \= Q iff Q for every T G ccf(F^ Q ). 

Proof. First we prove that if K |= Q then \=jr Q. Take any arbitrary T G 
ccf(F^?). Since K \= Q, then T \= Q (Proposition 3). In particular, we have 
that Ijr \= Q, where is the canonical model of the tableau induced by T . 
Thus, by Proposition 2, |=^r Q. 

To prove the other direction, observe that from Q and Lemma 5, we have 
that T \= Q for every T G ccf(F^ Q ). Finally, by Proposition 3, K \= Q. 

Corollary 3. J/R+ = in K, then K ^ Q iff Q for every T G ccf(F^ Q ). 
Corollary 4. K \= Q iff ^jr Q for every T G ccf(F^" Q ). 

5 Complexity 

In this section, for a knowledge base K, we will use c to denote the cardinality 
of c\os(K) U Ck, r the cardinality of and mc the maximum m occurring in 
a concept of the form < mR.C or > mR.C in c\os(K) U Ck- \A\ denotes the 
number of assertions in A. By \K\ we will denote the total size of the (string 
encoding the) knowledge base. Note that c, r and mc are linear on \KUCk\ 
assuming unary coding of numbers in number restrictions and constant on |*4|, 
while | Ik | is linear on both. 

Lemma 7. The maximal number T n of non-isomorphic n-trees in a completion 
forest for K is given by T n = 0((2 2c (cm c ) r ) (cmcr) "). 



18 



Proof. Since C{x) C clos(i^) U Ck, there are at most 2 C different node labels 
in a completion forest. Each successor of a node can be the root of a tree of 
depth (n — 1). considering a single role i?, if a node i> has x i?-successors, then 
there is a maximum number of (T n -i) x trees of depth (n — 1) rooted at v. A 
generating rule can be applied to each node at most c times. Each time it is 
applied, it generates at most m c i?-successors for each role R. This gives a 
bound of cm c i?-successors for each role. The number of i?-successors of a node 
might range from to cmc, an d for each number of -/^-successors, we have at 
most (T„_i)( cmc ) trees of depth (n — 1). So, each node can be the root of at 
most (cm c )(T„_i)( cmc ) trees of depth (n - 1) if we consider one single role. 
Since at most the same number of trees can be generated for every role in R^, 
there is a bound of ((cmc)(T n _i)'™ c )) r trees of depth (n — 1) rooted at each 
node. The number of different roots of an n-tree is bounded by 2 C . We now give 
an upper bound on the number of non isomorphic n-trees as 

T n = 0(2 c ((cmc)(T n _ 1 )(™ c )) r ) 

To simplify the notation, let's consider x = 2 c (cmc) r and a = cmcr. Then we 
have 

T n = O^T^r) = 0(x 1 + a +-+ a "- 1 (T Q r") = O((xT r") 

The maximal number of trees of depth is also bounded by 2 C . Returning to 
the original notation we get 

T n = 0((2 2c (cm c ) r ) (cmcr) ") 

Corollary 5. The maximal number T n of non-isomorphic n-trees in a comple- 
tion forest for K is: 

- single exponential in n 

- double exponential in \K\ if n is constant on \K\ 

- triple exponential in \K\ if n is single exponential on \K\. 

Lemma 8. The number of nodes in a completion forest T G is bounded by 

0(|l x |(cm c r)"( 22c ( cmc ) r ) (cmcr) ") 
Proof. The claim follows from the following properties: 

i) The outdegree of T is bounded by cm c r. 

Nodes are only added to the forest by applying a generating rule. Only 
concepts of the form 3R.S or > n R.C trigger the application of a generating 
rule, and there are at most c such concepts. Each such rule generates at most 
mc successors for each role, and there are r roles. Note that if a node v is 
identified with another by the V-rulc or the V r -rule, then the rule application 
which led to the generation of v will never be repeated [4] . 

ii) The depth of T is bounded by d = (T n + l)n. 

This is due to the fact that there is a maximum of T n non-isomorphic n- 
trccs. If there was a path of length greater than (T n + l)n to a node v in J 7 , 
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this would imply that v occurred after a sequence of T n + 1 non overlapping 
ra-trees, and then one of them would have been blocked and v would not 
have been generated. 

iii) The number of variables in a variable tree in T is bounded by 0((cmcr) d+1 ). 

iv) The number of variables in T is bounded by 0(|I^|(cmcr) d+1 ). 

Corollary 6. If n is constant on \K\, then the maximum number of nodes in a 
completion forest T G ¥ K is 4~ exponential on (\K\ + n), 3-exponential on \K\, 
double exponential on n and linear in \A\. 

Corollary 7. If n is single exponential on \K\, then the maximum number of 
nodes in a completion forest J 7 G F^ is 5-exponential on (\K\+n), 4-exponential 
on \K\, double exponential on n and linear in \A\. 

Proposition 4. The expansion of Tk into some T G terminates in time: 

- nondeterministic 3-exponential on \K\ if n is constant on \K\, 

- nondeterministic 4-exponential on (\K\ + n) if n is constant on \K\, 

- nondeterministic 4-exponential on \K\ if n is single exponential on \K\, 

- nondeterministic 5-exponential on (\K\ + n) if n is single exponential on \K\, 

- nondeterministic double exponential on n, 

- nondeterministic polynomial (linear) in \A\. 

Proof. Let M = 0(\lK\(cmcr) n< - 2 ( cmc ' ^ c ' ) denote the maximal number 
of nodes in T. We will obtain an upper bound of the number of rules that are 
applied to expand Tk into T. 

i) For a single node v, the n-rule, the U-rule and the choose-rule can be applied 
O(c) times, since they are applied at most once for each concept in C{v). 

ii) For the 3-rule, V-rule, V + -rulc, >-rule and <-rulc, the bound on the number 
of times it can be applied to v is given by the maximal number of successors 
of v, i.e. O(cmcr). 

iii) Rules 1 to 8 can be applied at most O(Mcmcr) times to obtain T. 

iv) The < r -rule can be applied at most once to each root node in Tk-, hence it 
is bounded by |Iff|. 

v) The total rule applications required to expand Tk into T is 0(|Ik| + 
(Mcm c r)) 

5.1 Complexity of answering Conjunctive Queries 

Lemma 9. For an T G Ccf(Fj^), checking whether \=p Q can be done in poly- 
nomial time. 

Proof (Sketch). TZ and T can be expressed as a relational database. The com- 
plexity of verifying whether Q is the complexity of answering a conjunctive 
query over a relational database, which can be done in polynomial time [1]. 

Theorem 2. Let K be a knowledge base with R + = 0. The algorithm answers 
the conjunctive query entailment problem in 3coNEXPTIME w.r.t. the size of 
K. 
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Proof. As Theorem 1 states, K ¥ Q iff there is some T G ccf(F^-) such that 
Y 1 ? Q. Since K does not contain transitive roles, n = uq is constant on \K\, and 
by Proposition 4, this J 7 can be obtained in time nondeterministic 3-exponential 
on \K\. From this and Lemma 9, we have that non-entailment is in 3NEXPTIME 
and the claim follows. 

Theorem 3. Let K be a knowledge base. The algorithm answers the conjunctive 
query entailment problem in 4coNEXPTIME w.r.t. the size of K. 

Proof. As Theorem 1 states, K ¥- Q iff there is some T G ccf(FJ-) such that 
Q. Since n = 2 2cr nQ is single exponential on \K\, by Proposition 4 T can be 
obtained in time nondeterministic 4-exponential on \K\. From this and Lemma 9, 
we have that non-entailment is in 4NEXPTIME and the claim follows. 

5.2 Data Complexity 

Theorem 4. The conjunctive query entailment problem over a knowledge base 
K in any DL from ACE to SHIQ is in coNP w.r.t. data complexity. 

Proof. Once again, by Theorem 1 we have that K ¥ Q iff there is some T G 
ccf (F^-) such that ¥jr Q. Proposition 4 states that this T can be obtained in time 
nondeterministic linear in |^4|, and by Lemma 9 it can be checked in polynomial 
time, hence non-entailment is in NP in data complexity and entailment is in 
coNP. 

Theorem 5. The conjunctive query entailment problem over a knowledge base 
K in any DL from ACE to SHIQ is coNP- complete w.r.t. data complexity. 

Proof. The first such hardness result was given in [7] , where coNP-hardness was 
proved for ACC. In [3] the same result is given for logics even less expressive 
than ACE . Membership for STCIQ is proved in Theorem 4. 
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